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METHOD AND APPARATUS FOR PERFORMING MODULAR ARITHMETIC 
Background of the invention 

Many electronic interactions require the provision of a certain level of security to 
5 ensure that the data contained in a message transfer is difficult to intercept and 
decode, and/or is capable of being verified as being genuine. To achieve these ends, 
it is possible to encrypt data according to one of many possible schemes. A popular 
scheme is called public key cryptography (e.g. PGP). Public key cryptography 
enables a particular message to be encoded according to an individual's private key 
10 and a third party's public key - both are long fixed numbers. The message may then 
be decoded by the third party through use of their private key. In this way, each party 
may keep their private key secret and thus control who is able to receive and decode 
any given message. 

15 One of the key elements of encryption systems is the ability to be able to perform 
modular arithmetic. The basic calculation which is performed may be written as : 

S = AB mod N (1) 

20 where A % B and N are large numbers, typically including many hundreds of digits. 

Cryptography systems are generally mathematically complex and can pose a high 
computational overhead on any system which Implements them. 

25 Description of the Prior Art 

Prior art systems for performing modular arithmetic make use of Montgomery's 
theorem, which has been used in many software and hardware implementations of 
modular arithmetic algorithms. Implementations using Montgomery's theorem are 
able to compute a value for S without first multiplying A and B and then dividing by 
30 N. Most of the hardware implementations rely on an iterative approach which 
decomposes A into k blocks of p bits to limit the size of the hardware operators 
required. Further advances have used a serial architecture to further reduce the 
circuit size. Such architectures are generally based around two serial multipliers. 
FIFO elements and the pre-computation of a constant Jo. such that : 
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J t .Na-lmo&2' (2) 

k and p are both positive integers, and the binary representation of a positive integer 
5 X, where X < 2*" may be given by: 

*«£*M2' (3) 

/-0 



10 



25 



where 0 £X/J7 < 2, i.e. X may be either 0 or 1 . 



Throughout this specification, square brackets [ ] refer to a particular bit position in a 
multi-bit word e.g. Xp] refers to the i m bit of word X. Angle brackets < > refer to a 
particular block of a multi-bit word e.g. X<i> refers to the \ m block of word X. 
Parentheses ( ) refer to the value of a word at a particular iteration of a loop function 
1 5 e.g. X(i) refers to the value of word X at the i m iteration. 

A definition for X/jWr/, where / > k t is that X is a positive integer having a total length 
of /+1-/C bits, such that Xffl is the MSB and X[k] is the LSB. 

20 The base ^representation of X is given by: 

* = 2aT<,>2* (4) 

NO 



where 0 sX</> < 2" 

In the following description, it is assumed that N is an odd integer such that 2f° t ' v < N 
< 2*". and that both A and B are less than N. A p-bit constant, J 0 , is thus defined as: 



y o ^(0)»-l.mod2" (5) 

30 
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N is the modulus number which is used in all public key cryptography systems. It is 
defined as the product of two large prime numbers (i.e. »2) and must therefore be 
odd. 

5 The prior art hardware implementation of the Montgomery theorem may be 
described by the following pseudo-code. 

1. procedure MM-BASIC(AB,A/) 

2. S(-1) = 0 

10 3. for/ = 0toK-1 

4. r=S(/-1) + v4(/>B 

5. Y 0 =(TJ 0 ) mod 2 P 

6. S(i) = (T+NY 0 )/2 p 

7. If S(i) *N then S(i) = S{i) - N 
15 8. end for 

The implementation of this pseudo code in hardware is shown in a simplified form in 
Figure 1. The architecture is constructed in serial form so that one bit of the solution 
is generated for each clock cycle. Such an architecture, as opposed to a parallel 
20 one, minimises the amount of hardware required at the expense of speed. 

The circuit of figure 1 is arranged to receive five different input signals: A[k] 200; Bp] 
205; S(M) 210; GE(i-1) 215; and N[t] 220. 

25 Serial Multiplier 1 10 accepts as Inputs, a fixed p-bit word, A(i) produced by register 
105, and a one-bit data stream Bp] 205. It then acts to produce the output, (j4</).S) v 
one bit at a time. 

Multiplier 110 is configured internally as shown in Figure 2. The two Inputs are the 
30 output 340 of register 105 and Bp] 205. The two inputs 205, 340 are ANDed together 
in AND gate 300. The result of this operation is fed into Carry Save Adder 310, along 
with two other inputs. The first of these other inputs is the carry output (C) derived 
from the fed back output from p-bit register 315. The other input to the Adder is 
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derived from the result output (R) of p-bit register 320 which has been divided by 2 in 
divider 305. Registers 315, 320 are positioned immediately after the Carry Save 
Adder 31 0 and each receives one of the twin outputs produced by the adder. 

5 The Carry Save Adder 310 is arranged to transform a sum of three numbers into a 
sum of two numbers such that : 

2.C + R = X + Y + Z (6) 

10 The Carry Save Adder 310 computes C(t) and R(t) based on the following bitwise 
Boolean equations. 

C(r) = (C(f - 1) OR R(t - 1) / 2) AND (C(/ - 1) AND £[*M(i)) AND (R(t - 1) / 2 AND 5tr]^(;}) 

• (7) 

15 /2(0 = C(r-l)©i?(/-l)/2©5[r]^{i> (8) 

In a simplified notation: 

C(f), R(t) = SERIAL_MULT (B[QA(i), C(M ), R(M )) (9) 

20 

The procedure MM_BASIC. already shown, may be written in, a form which shows 
the serial operations explicitly: 



1. procedure MM-SERIAL(A, B, N) 

25 2. S(-1) = 0 

3. G£(-1) = 0 

4. for/ = 0to/(-1 

5. #computation of Vb 

6. fort = 0top-1 

30 7. C S i(t), R s> (0 = SERIAL_SUB(C S< (M ), GE(M ) . S(M )[/]) 

8. . C U i(t). R M1 (t) = SERIAL_MULT(fl[0 . >A<i>, C M ,(M ), R*t(M )) 

9. C*,(f). R A i(t) = SERIAL_ADD(C>»f(M ). Rw(t)[0]. R S i(t)) 

10. C W2 (f), Rha{t) = SERIAL_MULT(R>,,(f) . J 0 . C^M ). R M2 (M )) 
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11. VoM = /W) 

12. end for 

13. # mail loop: computation of S(i) 

14. for t = 0 to frp+p-1 

5 15. Csi(t), R S i(t) = SERIAL_SUB(C S ,(M), GE(M) . MO. S(M)M) 

1 6. C M1 (t), R M1 (t) = SERIAL_MULT(8[f] . A(i). C wr (M ). /?A#f(M )) 

1 7. C*t(0. /fct(f) = SERiAL_ADD(C,„(M ). R M i(t)[Q] t Rsi(t)) 

1 8. C W2 (0. «W2(0 = SERIAL_MULT(R/\r(0 . Jo. C W2 (M )» Rm&A )) 

19. C^f), R^(f) = SERIAL_ADD(C4f (M ), R M2 (t)[0] t R A1 (t)) 
10 20. S(/)P-P] = ftu<0 

21 . SGE(t) = SERIAL_GE(SGE(M ), N[t-p] t S(/)['-p]) 

22. end for 

23. G£(/) = SGE(/cp+p-1) 



24. end for 

15 

The total number of clock cycles required to compute the result according to the 
above scheme is /c(/cp+2p). 

Summary of the Present Invention 

20 In a first broad form the present invention provides Apparatus having inputs A, B and 
N, and an output S, said apparatus being arranged to perform a modular operation, 
S=A.B mod N, the apparatus including a 2-stage Carry Save Adder (2-CSA) and a 1- 
stage Carry Save Adder (1-CSA), the 2-CSA being arranged to receive 5 input 
signals: 

25 

□ Uo» being the partial product of N and Yo; 

□ Ui, being the subtraction of a previous version of S and U 6 wherein U 8 is 
either N or 0 depending on the value of the comparison between the result of 
the previous iteration and N. 

30 □ U 2 , being the partial product of B with the current version of A; 
a U 3 , being S/2 

a U 4 , being the carry output of the 1 -CSA; 
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where result and carry outputs of the 2-CSA form two of three inputs to the 1-CSA, 
wherein the result (R) output of the 1-CSA is the desired result (S), and the third 
input to the 1-CSA is a compensation signal arranged to allow S to be calculated 
without knowing the constant Jo. where JqN<0> = -1. mod 2 P , where p is a block 
5 length into which A is sub-divided. 

In a second broad form, the present invention provides An iterative method of 
performing a modular operation of S = A.B mod N, where A, B and N are encoded as 
multi-bit digital words, including the following steps: 

a) setting S(-1 ) to 0, and i to 0 

b) setting S(i) to (S(M ) + A<i>B + NY 0 )/2 P 

c) setting S(i) to (S(i) - N) if S(i) 

d) repeating steps b) and c) k times. 

wherein: 

i is a loop counter 

k is a number of blocks of p bits length into which A is divided; 
Y 0 = ((T Jo) mod 2 P ); 
J 0 N = -1mod2 p ;and 

Y 0 is calculated one bit at a time, based on the fact that (T + NY 0 ) is a multiple of 2 P 

Other features and benefits of the invention will become apparent in the following 
25 description of various embodiments of the invention. 

Brief Description of the Drawings 

For a better understanding of the present invention and to understand how the same 
may be brought into effect, the invention will now be described by way of example 
30 only, with reference to the appended drawings In which: 

Figure 1 shows a simplified prior art circuit for implementing modular arithmetic 
according to Montgomery's theorem; 



10 



15 



20 
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Figure 2 shows a prior art serial/parallel multiplier or carry save adder; 

Figure 3 shows a merged multiplier as used in embodiments of the invention; and 

5 Figure 4 shows a hardware implementation according to an embodiment of the 
invention. 

Detailed Description of the Preferred Embodiments 

The present invention retains a serial architecture to accomplish the calculation, but 
10 embodiments of the inventions do not require pre-knowledge of the constant, J 0 . 
Embodiments of the invention calculate Y 0 = ((7. J 0 ) mod 2 P ) one bit at a time, based 
on the fact that (7 + NY 0 ) must be a multiple of 2 P . In this way, the complex 
mathematical functions required to pre-compute J 0 can be dispensed with. 

15 With this implicit knowledge, the procedure MM-BASIC described previously, may 
now be written as MM-SIMPLE: 

1 . procedure MM-SIMPLE(A B, N) 

2. S(-1) = 0 

20 3. for/ = 0to/c-1 

4. S(i) = (S(M ) + A(f)B + NY Q )I2 P 

5. If S(i) then S(i) = S(i) - N 

6. end for 

25 The above serial implementation of MM-SIMPLE Is more efficient than the prior art 
implementation of MM-BASIC as the two multipliers required in the prior art can be 
merged into a single multiplier in embodiments of the invention. The gain, in terms of 
fewer components, is a total of 2p registers plus the two serial adders 120 and 155. 
The removal of the need for these components removes a significant amount of 

30 circuitry, and thus the resulting architecture requires less space and consumes less 
power to achieve the same result. It also calculates the result in fewer clock cycles. 
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Figure 3 shows the resultant hardware implementation which may be used to 
perform the steps of procedure MM-SIMPLE presented above. 

Y 0 is computed bit by bit during the first p cycles of the loop, starting at line 15 of the 
5 . procedure MM-SERIAL. Assuming that at cycle q<p t the bits 0, 1,...g-1 have already 
been computed, leaving only bit q to be discovered. 

According to embodiments of the present invention, if, at cycle q, the LSB of the 2- 
stage Carry Save Adder shown in Figure 3 is T, then N[q:0] is added to the 
10 intermediate result, and Y 0 [q] = 1. 

This may be proved as follows. At the q* step, the intermediate values from the first 
Carry Save Adder may be given as : 

15 S=2C+R (10) 

= {A(i)*[q : 0] + Y 0 [q - 1 : 0] JVfe : 0] + S(i - : 0]) / 2* (11) 
Assuming that the 0 th bit of Y 0 is a *1 \ then the above equation may be re-wrttten as: 

S'=(A®Jtq:Q] + t2< + Y 0 [q-l:0]Wq-Q] + S(i-ftq:0])/2< (12) 
20 = S + /Vfo:0] (13) 

As the LSB of N is always 1, since it is a large prime number and, therefore, odd, 
then from the above equations, it can be seen that the LSBs of S and S 9 are always 
inverted. Therefore, it Is possible to guarantee that the LSB of the result is 0 in the 
25 first p steps by choosing either S or S\ The choice of S' implies that the q m bit of Y 0 
must be forced to equal 1 . 

The above step is repeated at each cycle q<p, so that at the end all bits of Yo are 
discovered. 

30 

The procedure, MM-SERIAL-SIMPLE shown below is a pseudo-code 
implementation of an embodiment of the present invention, and is a version of the 
previously presented MM-SERIAL adapted according to the above results. 



1 
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1. procedure MM-SERIAL-SIMPLE(A 3, N) 

2. S(-1) = 0 
5 3. G£(-1) = 0 

4. for/ = 0to/c-1 

5. # main loop: computation of S(i) 

6. for t - 0 to Ap+p-1 

7. C 5 ,(f), Rsi (0 = SERIAL_SUB(Csf(M ), GE(/-1 ) . A/[fl. S(/-1 )[<]) 

1 0 8. C,, = 2-STAGE_CSA(B[0 . 4</>, C^M ). * jw<M MA - Y 0 ) 

9. If f < p and fl^O] = 1 then 

1 0. CUt). RM = CSA(A/[f:0], Qw, 

11. VoM = 1 

12. else 

15 13. Cm(0, KaM » C„f f Rmf 

14. end if 

15. S(/)[f-p] = R«(0[0] 

16. SGE(f) = SERIAL_GE(SGE(M ), N[t-p], S(/)[f-p]) 

17. end for 

20 18. GE(i) = SGE(kp+p-1) 

19. end for 



The conditional statement at line 9 of the above procedure may be considered to 
trigger a compensation event which, if t<p and R^iO] =1 , causes the value of register 
25 525 N deJ to be applied to the input of the 1 -stage CSA (1-CSA) 540. If the condition is 
not satisfied, then the C and R outputs of the 2-stage CSA (2-CSA) 520 merely feed 
straight into the 1-CSA and no compensation is performed. 

It is the addition of the compensation function which directly removes the need to 
30 explicitly compute Jo- 

In figure 4, the compensation function is implemented by register 525, AND gate 
530, MUX 535. The MUX 535 effectively performs the conditional IF statement of line 
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9 of MM-SERIAL-SIMPLE. and if FWO] is equal to 1, then the contents of register 
525 is applied to 1-CSA 540. 

The above procedure (MM-SERIAL-SIMPLE) is further explained in the procedure 
below (MM-SERIAL-SIMPLE_enhanced), which includes further details on selected 
5 ones of the internal signal nets. 

These internal nets are labelled from U 0 to U 8 and directly correspond with selected 
internal nets shown in Figure 4. 



10 1. procedure MM-SERIAL-SIMPLE_enhanced(A B, N) 

2. S(-1) = 0 

3. GE(-1) = 0 

4. >W = Atp-W] 

5. for/=0to/c-1 

15 6. # main loop: computation of S{i) 

7. N del = 0 

8. Y 0 = 0 

9. R = 0 

10. C = 0 

20 11. Aeumnt ~ Anext 

12. >W = A[(MXp-1):(r+1)p] 

13. for t = 0 to kp+p-i 

14. U 0 = AND2(NM, Y 0 ) 

15. U a = AND1 (GE(M ), N[t\) 
25 16. U, = SUB1(t/ 9 . S(M[/J) 

17. Uz = AND3(SM, Aamiu) 

18. U 3 = R/2 

19. U 4 = C 

20. Qn6 «w = 2-STAGE-CSA(t/ 0 , U it U 2 . U 3 , U 4 ) 
30 21. L/ r =MUX(R in ,[0],0) 

22. U s = AND4(U 7 , N M ) 

23. tff<pthen 

24. Vo(<l = t/7 

25. Wcteitf] = N[f\ 



26. 


Ug = 0 


27. 


else 


28. 


# Ndei acts as a shift register 


29. 


U a = NjojO] 


30. 


N M = /W2 


31. 


AMp-1] = Mr] 


32. 


endlf 


33. 


C,R = CSA(ty 5 . C W ,R,„,) 


34. 


S(/)M = R[0] 


35. 


SG£(Q = G£(t/ a . «[0]) 


36. 


end for 


37. 


GE(/) = SGE(/cp+p-1) 


38. 


end for 



As an example, presented below are details of how an embodiment of the invention 
operates on some sample input data. The following inputs are provided, in 32-bit 
format: 

A = C7197F0E 

B = CCEFBAE4_77AF9EE5_848D8AE6 
N = D077EC53_F4AA27A4_D7816723 

The result of the Montgomery multiplication of A by fl is given by (AB+NY 0 V2 P . 
Before the computation starts, the registers of the multiplier are initialised as follows. 



No = 00000003 

Yo = 00000000 

RC = OJ000O0O0O 

B[t)= 6' 



For the sake of simplicity, the registers R and C have been summed into register RC, 
and the computation is performed 4 bits (a nibble) at the time, thus setting p=4. 



1 . Computation of the intermediate results, based on the partial products 
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NftJ.YO = 0_00OOOOQ0 

+BPJJI = 4_AA98FA54 

+RC/16 = 0_00000000 

^Intermediate = 4_AA98FA54 

2. Find the first 4 bits of compensation value (Z) such that the 4 LSBs of 
lntermediate+Z.N 0 are all zero. 

2 =4 

3. Add the partial product Z.N 0 to Intermediate 

Intermediate = 4_AA98FA54 

+Z.N 0 = 0_00OO0O0C 

=RC = 4_AA98FA60 

4. Update the registers with the new values and restart the cycle 

No = 00000023 Y 0 = 00000004 RC = 4_AA98FA60 B[tJ = £ N[t] 

1 . Computation of the intermediate results, based on the partial products 

Nffl.YO =0_0000008C 
+B[tJA = A_E364F2C4 

+RC/16 = 0_4AA98FA6 

intermediate = BJ2E0E8272 

2. Find first 4 bits of compensation (2) such that the 4 Isb of 
lntermediate*Z.No are all zero. 

Z =A 

3. Add the partial product Z.N 0 to Intermediate 

Intermediate = BJ2EOE8272 

+ZN 0 = OJ0000015E 

=RC = B_2E0E83D0 

4. Update the registers with the new values and restart the cycle 

No = 00000723 Y 0 = 000O00A4 RC - BJ2E0E83D0 B[t] = A N[tJ 



1 . Computation of the intermediate results, based on the partial products 
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N[t].YO 
+B[t].A 
+RC/16 
intermediate 



= 0J300OO47C 
= 7_C6FEF68C 
= 0_B2E0E83D 
-8 79DFE345 



2. Find first 4 bits of compensation (Z) such that the 4 Isb of Intermediate+Z.NQ 
are all zero. 

Z =9 

3. Add the partial product Z.N Q to Intermediate 

Intermediate = 8_79DFE345 

+Z.N 0 = 0_0000403B 

=SUM 2 = 8_79E02380 

4. Update the registers with the new values and restart the cycle 



No = 00006723 Y 0 = 00O009A4 RC = 8J79E02380 



B[t]=8 N[t] = 6 



This process is repeated until all the bits of Y 0 are discovered. At this stage, the 
compensation phase is no longer needed so the computation iterates over the 
remaining bits of B and N. The step by step result at each phase Is given by the 
following table: 



Cycle 


No 


Y 0 


RC 


Bit] 


WJ 


0 


XXXXXXXX 


XXXXXXXX 


XXXXXXXXXX 


X 


X 


1 


00000003 


00000000 


0000000000 


6 


3 


2 


00000023 


00000004 


04AA98FA60 


E 


2 


3 


00000723 


000000A4 


0B2E0E83DO 


A 


7 


4 


00006723 


000009A4 


0879E02380 


8 


6 


5 


00016723 


000009A4 


06C06A3480 


O 


1 


6 


00816723 


00OA09A4 


OA88602800 


8 


8 


7 


07816723 


000A09A4 


06E1A24810 


4 


7 


8 


D7816723 


090A09A4 


03CE530470 


8 


D 


9 


D78 16723 


790A09A4 


0CCFBD7800 


5 


4 


10 


4D781672 


790A09A4 


0694A37956 


E 


A 


11 


A4D78167 


790A09A4 


1007138AC1 


E 


7 


12 


7A4D7816 


790A09A4 


0F331C6EEC 


9 


2 
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13 


27A4D781 


790A09A4 


08E52B51B4 


F 


A 


14 


A27A4D78 


790A09A4 


10F3358755 


A 


A 


15 


AA27A4D7 


790A09A4 


0D9096AF69 


7 


4 


16 


4AA27A4D 


790A09A4 


082EE40AE8 


7 


F 


17 


F4AA27A4 


790A09A4 


0D0C374AAC 


4 


3 


18 


3F4AA27A 


790A09A4 


0558478DCE 


E 


5 


19 


53F4AA27 


790A09A4 


00961 B9BD4 


A 


C 


20 


C53F4AA2 


790A09A4 


0E4CD923F9 


B 


E 


21 


EC53F4AA 


790A09A4 


1011728ED1 


F 


7 


22 


7EC53F4A 


790A09A4 


0FFADBDE3B 


E 


7 


23 


77EC53F4 


790A09A4 


0F3258F423 


C - 


0 


24 


077EC53F 


790A09A4 


0A485783EA 


C 


D 


25 


D077EC53 


790A09A4 


101F39EA3A 


0 


0 


26 


0D077EC5 


790A09A4 


0101F39EA3 


0 


0 


27 


OOD077EC 


790A09A4 


00101F39EA 


0 


0 


28 


000D077E 


790A09A4 


000101F39E 


0 


0 


29 


0000D077 


790A09A4 


00001 01 F39 


0 


0 


30 


00000D07 


790A09A4 


000001 01 F3 


0 


0 


31 


00000ODO 


790A09A4 


0000001 01 F 


0 


0 


32 


OOOOOOOD 


790A09A4 


0000000101 


0 


0 


33 


00000000 


790A09A4 


0000000010 


0 


0 


34 


00000000 


790A09A4 


0000000001 


0 


0 



Notice that the serial output result can be read directly as the right most nibble of the 
RC column. It is also interesting to notice the shifting pattern of N Q . From cycle 1 to 
25 8, the register behavior is comparable to a stack, where the nibble are pushed from 
the left. From cycle 9 onward, the register behaves as a right shift register. The 
output of this register shall be used as the input of a comparator which detects if the 
results is greater or equal to A/. 

V= 790A09A4 

30 RESULT = 1JD1F39EA3_AA3B194EjC8954C16J)0000QQ0 



In the light of the foregoing description, It will be clear to the skilled man that various 
modifications may be mode within the scope of the invention. 

The present invention includes and novel feature or combination of features 
disclosed herein either explicitly or any generalisation thereof irrespective of whether 
or not it relates to the claimed invention or mitigates any or all of the problems 
addressed. 
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CLA1MS 

1. Apparatus having inputs A, B and N, and an output S, said apparatus being 
arranged to perform a modular operation, S=A.B mod N, the apparatus Including a 2- 
stage Carry Save Adder (2-CSA) and a 1 -stage Carry Save Adder (1-CSA), the 2- 
CSA being arranged to receive 5 input signals: 

a Uo, being the partial product of N and Y 0 ; 

a Ui, being the subtraction of a previous version of S and lie wherein Ue is 
either N or 0 depending on the value of the comparison between the result of 
the previous iteration and N. 

q U 2 , being the partial product of B with the current version of A; 

a U 3 , being S/2 

□ U 4 , being the carry output of the 1-CSA; 

where result and carry outputs of the 2-CSA form two of three inputs to the 1-CSA, 
wherein the result (R) output of the 1-CSA Is the desired result (S), and the third 
input to the 1-CSA is a compensation signal arranged to allow S to be calculated 
without knowing the constant J 0 , where J 0 N<0> = -1. mod 2 P , where p is a block 
length into which A is sub-divided. 

2. Apparatus as claimed in claim 1 wherein the compensation signal is arranged to 
equal a delayed version of N in the event that t<p and the Result (R) output of the 2- 
CSA equals T. 

3. Apparatus as claimed in any one of the preceding claims wherein the 2-CSA 
includes two 1-CSA arranged in series. 

4. Apparatus as claimed in any one of the preceding claims wherein while 
processing bits 0 to p-1 , register Y 0 is arranged such that the LSB of the Result (R) 
output of the 1-CSA is always *0\ 

5. Apparatus as claimed in any one of the preceding claims wherein the apparatus 
is arranged to take the form of a custom integrated circuit 
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6. Apparatus as claimed in claim 5 wherein the custom integrated circuit includes a 
digital signal processor (DSP). 

7. An iterative method of performing a modular operation of S = A.B mod N, where 
5 A, B and N are encoded as multi-bit digital words, including the following steps: 

a) setting S(-1 ) to 0, and i to 0 

b) setting S(i) to (S(l-1) + A<i>B + NY 0 )/2 P 

c) setting S(i) to (S(i) - N) if S(i) 2sN 
10 d) repeating steps b) and c) k times. 

wherein: 

i is a loop counter 

k is a number of blocks of p bits length into which A is divided; . 
15 Y 0 = ((T.Jo) mod 2 P ); 

J 0 N = -1mod2 p ;and 



Y 0 is calculated one bit at a time, based on the fact that (T + NY 0 ) is a multiple of 2 P . 
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ABSTRACT 

METOOD AND APPARATUS FOR PERFORMING MODULAR AR ITBHEIIC 
An apparatus and method is disclosed for performing the modular operation S=AB 
mod N. The apparatus Is arranged such that the constant JO which Is ordinarily 
required in order to complete the operation is not required to be explicitly computed, 
thus simplifying and speeding up the operation. 

I 

Figure 4 



■nmii 

♦G00002* 




Illillllll 



♦000002* 



♦163163* 



1/3 



200- 



205- 



220 



100 

IE 



105 



110 



210 H 

215 ► 



115 

z: 



125 



120 



15Q 



r 



130 



135 



I T 



140 

31 



145 



155 



160 



165 



->250 



-►255 



FIGURE 1 



2/3 



205- 



340 



300 
IT 



3 

315 



305 



310 



320 



-►350 



FIGURE 2 



470- 



475 485 
480- 



465 



400 




410 


420 




►490 



FIGURE 3 



3/3 



200- 



A[kJ 



205- 



B[t] 



210 sfcliBj 



215^liJ 



220 



NftJ 



100 

~T 

105 

31 



500 



115 
125 



SUB 



AND 



♦ 525 

INm 
▼ 

530 

[and 



505 



510 
I Uo 



Y 0 
AND 



^ ^p-bits wide 



U, 



Anext 
Acurrent 

AND 



j: 



515 



12 



U 2 



520 



2-stage Carry-Save 
Adder 



535 



MUX 
— 0 



1 



540 



Carry-Save 
Adder 



545 



550 



g^E3 » 560 



GE(i) 



■♦-565 



GE 



FIGURE 4 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in the images include but are not limited to the items checked: 

□ BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 



^-LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: — 

IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 



□ SKEWED/SLANTED IMAGES 



□ COLOR OR BLACK AND WHITE PHOTOGRAPHS 



□ GRAY SCALE DOCUMENTS 




